Robust Text Analysis via Underspecification
نویسنده
چکیده
This paper is concerned with the robust analysis of the discourse structure of a text via underspecification. Most current discourse theories (e.g. Rhetorical Structure Theory (RST) by Mann and Thompson (1988), Abduction by Hobbs et al. (1993) or Segmented Discourse Representation Theory (SDRT) by Asher (1993)) require detailed world and context knowledge for the derivation of the discourse structure. A discourse structure for a given text has to be obtained in every case. For an ambiguous discourse a high number of structures may be generated. The present approach instead derives an underspecified discourse structure for text based on a limited set of discourse cues. Only when evidence for a discourse relation or a set of discourse relations is given, for example, via a discourse marker is the discourse structure further specified. After providing background information on underspecification and SDRT, a general framework of an underspecified discourse grammar is outlined. This framework captures scope ambiguities of discourse relations, introduces to the SDRT representation the underspecification of the discourse relation that links two segments, and further specifies the content of an abstract topic node that dominates a segment.
منابع مشابه
Shallow Parsing and Text Chunking: a View on Underspecification in Syntax
This paper illustrates a technique of shallow parsing named “text chunking” whereby “parse incompleteness” is reinterpreted as “parse underspecification”. A text is chunked into structured units which can be identified with certainty on the basis of available knowledge. The chunking process stops at that level of granularity beyond which the analysis gets undecidable. We argue that a chunked sy...
متن کاملA Semantic Explication of Information Status and the Underspecification of the Recipients’ Knowledge
This article presents a survey of and an investigation into the notion of information status. Based on insights from DRT and presupposition theory a new variant of IS taxonomis is developed, considering issues such as accommodation and underspecification of text with regard to hearer knowledge.
متن کاملTowards a Robust Deep Language Understanding System
We propose a system that bridges the gap between the two major approaches toward natural language processing: robust shallow text processing and domain-specific (often linguistically-based) deep understanding. We propose to use an existing linguistically motivated deep understanding system as the core and to leverage statistical techniques and external resources such as world knowledge to broad...
متن کاملA Cascaded Finite-State Parser for German
The paper presents two approaches to partial parsing of German: a tagger trained on dependency tuples, and a cascaded finite-state parser (Abney, 1997). For the tagging approach, the effects of choosing different representations of dependency tuples are investigated. Performance of the finite-state parser is boosted by delaying syntactically unsolvable disambiguation problems via underspecifica...
متن کاملA Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine
Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...
متن کامل